Add RNG to Dropout #1618

DhairyaLGandhi · 2021-06-15T10:05:00Z

darsnack · 2021-06-15T12:10:09Z

src/layers/normalise.jl

-function dropout_mask(x, p; dims=:)
-  y = rand!(similar(x, _dropout_shape(x, dims)))
+function dropout_mask(rng::AbstractRNG, x, p; dims=:)
+  y = rand!(rng, similar(x, _dropout_shape(x, dims)))


I don't think this dispatches to the GPU intrinsic when rng is specified. @maleadt?

We only override default_rng, I haven't looked into getting GLOBAL_RNG to work (maybe it does).

We can switch it for default_rng too of course

But you have to call it on the GPU. You can't call default_rng in the Dropout ctor and forward that RNG object to the GPU (unless Dropout objects are created in device code?).

I guess it depends on who we give the responsibility to provide a valid rng to (the user or flux) - a proverbial gpu(rng)

I don't really follow the relevance of the linked comment. AFAIK there are no GPU compatible RNGs. There are simply the RNGs provided by CUDA.jl/GPUArrays.jl. to_gpu is similar to trainable — it allows us to define how to map leaf in the structure to the GPU without defining CUDA.cu on types that it shouldn't be defined on. It's out of necessity, but if you can think of an alternative, go for it (I'm not tied to to_gpu at all).

I was referring to the exactly the ones provided by CUDA/CURAND. Sorry if that wasn't clear.

So I don't understand your comment then. This provides a mapping from GLOBAL_RNG to CUDA.CURAND.default_rng(). And when a user does Dropout |> gpu with any other CPU RNG, it throws an error instead of silently ignoring it.

I think the most sensible place to error is when the user tries m |> gpu on a model that contains Dropout with CPU RNGs that have no GPU equivalent. It would be weird for m |> gpu to run successfully when the model isn't executable as is on the GPU. Either way, I don't care how we do it, but as long as we don't silently use a CUDA RNG when x isa CuArray and throw an error instead, then it's fine by me.

Right - it will error anyway, its more clear when we can say that CUDA needs a way to dispatch to a CUDA RNG. I agree on that.

CarloLucibello · 2021-06-15T15:21:39Z

LGTM, just needs tests and addressing the gpu case.
A slightly different interface could be given by using a keyword argument for the layer constructor, Dropout(p; rng=GLOBAL_RNG).

ablaom · 2021-06-15T21:30:55Z

See also #1372

DhairyaLGandhi · 2021-06-16T06:55:26Z

Switched out GLOBAL_RNG for default_rng, @maleadt do we need to do something differently for it to pick up CUDA.CURAND.default_rng()

Manually adding a dispatch seems to go against the "write your kernel once and run anywhere" process.

CarloLucibello · 2021-06-16T07:38:58Z

src/layers/normalise.jl

  y .= _dropout_kernel.(y, p, 1 - p)
  return y
 end

 """
-    Dropout(p; dims=:)
+    Dropout([rng = default_rng()], p; dims=:)


DataLoader takes rng as a keyword argument

Flux.jl/src/data/dataloader.jl

Line 70 in 335286a

function DataLoader(data; batchsize=1, shuffle=false, partial=true, rng=GLOBAL_RNG)

we may want to do the same here for consistency.
For the functional form dropout instead, it is ok to take rng as the first positional argument

The pr seems easier to parse and cleaner api wise. Good point on the data loaders. We should revisit them separately.

The pr seems easier to parse and cleaner api wise.

I don't agree with any of those. Actually having rng as the first positional argument forced you to duplicate the Constructor. And I generally prefer keyword arguments in these cases since user code becomes more self-explanatory.
Merging this PR as it is would mean introducing an inconsistency now and break things later (if we then change DataLoader as well), all of which for no added benefit in my opinion.
Let's hear if @darsnack and @ToucheSir have an opinion on this

I don't think the constructor was duplicated without reason (its not duplication at all - its forwarded to the other constructor). Flux hasn't really been kwarg forward where it doesn't add value, so we can keep this for now.

In general, I think kwargs add clarity over positional arguments precisely because their position is irrelevant. In general, you can assume the user is not going to remember the placement of arguments > 3. That being said, having the RNG first is a standard pattern throughout Julia.

I could see dropout([rng], ...) following this pattern, but it seems really odd for Dropout. For example, in Distributions, you don't construct Normal([rng], ...). I don't think I've ever seen this pattern used in a non-functional form (i.e. constructors). I would also suggest using a keyword argument for the constructor.

Right, we wouldn't want that. More generally, not everything inside that do block may run on the gpu, and it's more annoying to write out one part of the problem with this block and others not just for rngs. It would make sense for off loading everything to the GPU for example. I don't think we have the seed api for CUDA? Even then that's only relevant for global effects, it may be desirable to have different rngs for Dropout vs (say) initialisation. Personally I would prefer it if we could handle it outside too.

one has enough access to model construction to thread through an RNG to a subset of layers

There are limits to this, but if all someone wanted is to pass a specific RNG to certain types of layers, then that can be done with fmap and the exclude keyword specified appropriately.

Personally, I prefer the solution where layer-specific arguments are possible. It would match how RNGs are passed into most Julia functions, and conceptually, the RNG/RNG state is very much an "input" to the model. I am struggling to see how we make that happen though. How does Flax handle it? Do you have to write a custom forward pass to utilize a non-default RNG?

Wait, so you are advocating for storing the RNG in the layer struct? My intent was to figure out what scenarios would require doing so, and thus far it doesn't seem like there are many. Using different RNGs for the forward pass vs initialization isn't one of them IMO, because those don't happen at the same time (and thus you can seed/scope the RNG for one and not the other).

No, I also prefer not to store the RNG in the struct, but I don't see an alternative. If seeding is the only goal, then this PR is not really needed. If different RNG types are required, then I feel that you need scoping, but that can get tricky with different devices like Dhairya mentioned. Only option left seems like the ability to pass through layer-specific arguments on the forward pass. IMO this would be a nice addition if we figured out a good way to do it. It can be used for more than RNGs.

Re fmap: that was basically me pointing out that if we are forced to store the RNG in the struct, swapping it out doesn't require fine-grained access to the model building.

What I was trying to get at (and probably muddled, apologies) is that putting the RNG in the struct isn't necessary to fix #1372. If we instead provided, say, a top-level function to seed all RNGs like PyTorch does, then all this discussion about device compatibility and conversion would be moot. This is to say nothing of edge cases that haven't been discussed: I can already foresee saving/loading behaviour and how it interacts with RNG "tying" across different layers being a headache.

TL;DR adding an RNG struct parameter feels like YAGNI in context and isn't worth the implementation complexity/headaches it'll bring IMO. If we really want to look into this, a more principled investigation (e.g. can we co-opt something like JAX's RNG key design) would be prudent.

src/layers/normalise.jl

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

maleadt · 2021-06-16T08:45:09Z

do we need to do something differently for it to pick up CUDA.CURAND.default_rng()

No, we have device overrides now:

https://github.com/JuliaGPU/CUDA.jl/blob/ee70b71b620edad627f7dc8aa7e3e385a63f8bb8/src/device/random.jl#L95

EDIT: I misunderstood; that mechanism works only for device code. You'll need additional dispatch to get a RNG suitable for CuArrays.

…into dg/drop

darsnack · 2021-06-16T14:59:35Z

src/layers/normalise.jl

 end

 function (a::Dropout)(x)
  _isactive(a) || return x
-  return dropout(x, a.p; dims=a.dims, active=true)
+  return dropout(x, a.p; dims = a.dims, active = true)


This ignores the RNG right now

Right - I did it to check out what we can do to ensure the kernel can actually run with a correct rng - I don't intend to ignore the rng here when merging.

This is more intended for having concrete code to weigh out the different approaches

ablaom · 2022-01-25T02:24:25Z

@DhairyaLGandhi Right, so what is the status of #1617, then?

darsnack · 2022-01-25T03:35:44Z

See #1849 now.

DhairyaLGandhi added 2 commits June 15, 2021 15:34

add rng to dropout

4ce53a3

add rng to kernel call

c63ded6

darsnack reviewed Jun 15, 2021

View reviewed changes

DhairyaLGandhi added 2 commits June 16, 2021 11:34

actually disallow bad kernels

1121e7c

replace global_rng with default_rng

0399c37

CarloLucibello reviewed Jun 16, 2021

View reviewed changes

src/layers/normalise.jl Outdated Show resolved Hide resolved

Update src/layers/normalise.jl

d9f4927

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>

DhairyaLGandhi added 2 commits June 16, 2021 19:49

add manual cuda dispatch

3a2fcea

Merge branch 'dg/drop' of https://github.com/dhairyagandhi96/Flux.jl …

3d3fc1e

…into dg/drop

darsnack reviewed Jun 16, 2021

View reviewed changes

DhairyaLGandhi added 2 commits June 16, 2021 21:21

dont ignore rng

2996490

fix adjoint

0cded38

run doctests on latest julia

efc7de3

ToucheSir mentioned this pull request Aug 22, 2021

Frest start : Next steps FluxML/ONNX.jl#49

Open

ablaom mentioned this pull request Dec 13, 2021

fix MLP FluxML/MLJFlux.jl#194

Merged

DhairyaLGandhi closed this Jan 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RNG to Dropout #1618

Add RNG to Dropout #1618

DhairyaLGandhi commented Jun 15, 2021

darsnack Jun 15, 2021

maleadt Jun 15, 2021

DhairyaLGandhi Jun 15, 2021

maleadt Jun 16, 2021

DhairyaLGandhi Jun 16, 2021

darsnack Jun 16, 2021

DhairyaLGandhi Jun 16, 2021

darsnack Jun 16, 2021

darsnack Jun 16, 2021 •

edited

Loading

DhairyaLGandhi Jun 16, 2021

CarloLucibello commented Jun 15, 2021

ablaom commented Jun 15, 2021

DhairyaLGandhi commented Jun 16, 2021 •

edited

Loading

CarloLucibello Jun 16, 2021 •

edited

Loading

DhairyaLGandhi Jun 16, 2021

CarloLucibello Jun 16, 2021

DhairyaLGandhi Jun 16, 2021 •

edited

Loading

darsnack Jun 16, 2021 •

edited

Loading

DhairyaLGandhi Jun 17, 2021

darsnack Jun 17, 2021

ToucheSir Jun 17, 2021

darsnack Jun 17, 2021

ToucheSir Jun 17, 2021

maleadt commented Jun 16, 2021 •

edited

Loading

darsnack Jun 16, 2021

DhairyaLGandhi Jun 16, 2021

DhairyaLGandhi Jun 16, 2021

ablaom commented Jan 25, 2022

darsnack commented Jan 25, 2022

Add RNG to Dropout #1618

Add RNG to Dropout #1618

Conversation

DhairyaLGandhi commented Jun 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

darsnack Jun 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarloLucibello commented Jun 15, 2021

ablaom commented Jun 15, 2021

DhairyaLGandhi commented Jun 16, 2021 • edited Loading

CarloLucibello Jun 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DhairyaLGandhi Jun 16, 2021 • edited Loading

Choose a reason for hiding this comment

darsnack Jun 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maleadt commented Jun 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ablaom commented Jan 25, 2022

darsnack commented Jan 25, 2022

darsnack Jun 16, 2021 •

edited

Loading

DhairyaLGandhi commented Jun 16, 2021 •

edited

Loading

CarloLucibello Jun 16, 2021 •

edited

Loading

DhairyaLGandhi Jun 16, 2021 •

edited

Loading

darsnack Jun 16, 2021 •

edited

Loading

maleadt commented Jun 16, 2021 •

edited

Loading